RDM Validation

RDM Validation Overview

After the data is loaded to staging schema, RDM validation stored procedure is invoked to validate the external codes.

MDM_RDM_VALIDATION Arguments:

  • DB_NAME: Database name

  • LAST_UPDATE_DATE: It is the UPDATE_DTT of staging tables used to filter records

  • MAPPING_JSON:

    Copy
    {

      "dataSource": "CRMMI",

      "sourceSchema": "ODP_CORE_STAGING",

      "targetSchema": "MDM_CANONICAL",

      "enableRdmLookup": true,

      "dateFormat": "MM-DD-YYYY",

      "mapping": [

        {

          "sourceTable": "MDM_Identifier",

          "targetTable": "C_CUSTOMER",

          "isLookupTable": true,

          "fields": [

            {

              "sourceField": "DW_ID",

              "targetField": "CUSTOMER_ID",

              "targetFieldType": "String"

            },

            {

              "sourceField": "COUNTRY",

              "targetField": "COUNTRY",

              "targetFieldType": "String",

              "lookupHeader": "LKUP_IMS_COUNTRY",

              "rdmValidationFailureAction": "Error-Entity"

            },

    {

              "sourceField": "TYPE_CODE",

              "targetField": "TYPECODE",

              "targetFieldType": "String",

              "lookupHeader": "LKUP_IMS_TYPE",

              "rdmValidationFailureAction": "Error-Warning"

            },

            {

              "sourceField": "CUSTOMER_ID",

              "targetField": "CUSTOMER_ID",

              "targetFieldType": "reference",

              "sourceParentTable": "MDM_GENERICIMPORT_HCP",

              "sourceParentKey": "CUSTOMER_ID"

            }

          ]

        }

      ]

    }

    Top Level Attributes:

    dataSource: Source Name of the records to be processed

    sourceSchema: Staging schema

    targetSchema: Canonical Schema

    enableRdmLookup(boolean): If true then perform RDM validation

    Mapping Level Attributes:

    sourceTable: Table name in staging schema on which the validation is to perform

    isLookupTable(boolean): If true then perform RDM validation

    Field Level Attributes:

    sourceField: Column name of the table in staging

    targetField: Column name of the table in canonical

    targetFieldType: column datatype

    lookupHeader: lookup name (For Example: LKP_IMS_COUNTRY). Objects with in the ‘fields’ array having property ‘lookupHeader’ and ‘rdmValidationFailureAction’ are eligible for RDM validation.

    rdmValidationFailureAction: The action that should be taken for the failed validation records. There are four types of rdmValidationFailureAction:

    → Error-Entity: Profile should not be moved to canonical schema including parent tables if applicable. It is to reject complete profile when any lookup is not found.

    (For Example: If a record in Identifier table fails validation then it'scorresponding record in parent table HCP is also considered failed)

    → Error-Sub-Entity: Only the failed profile should not be moved to canonical, it is to reject only child entities (not complete profile) when lookup for child entity attribute is not found.

    (For Example: If a record in Identifier table is failed then it’s not moved to canonical. No impact to parent record)

    → Error-Attribute: Record can be moved to canonical even it fails validation. it is not to reject the profile when look up is missing for in-significant attribute. Instead load the profile with null value for missing lookup attribute.

    (For Example: Even a column GENDER fails validation, it can be moved to canonical with empty value)

    → Error-Warning: Record can be moved to canonical even it fails validation but with source codes. This cannot reject attribute/child records/main profile when lookup is missing. Instead load the complete profile including all missing codes and just log the warning in RDM reject table so that it can be extracted in Reject Report.

    (For Example: Even a column GENDER fails validation, it can be moved to canonical with GEN.U value which doesn't exists in RDM)

    If no rdmValidationFailureAction is provided, then ‘Error-Entity’ is picked by default.

    sourceParentTable and sourceParentTable: Objects containing these two attributes indicates it's a child table, it's parent table is given in sourceParentTable and is linked by sourceParentKey column.

    RDM_VALIDATION table holds the records that failed validation. Multiple column failures are merged to single record in this table

    RDM_VALIDATION_DETAIL table holds the records that failed validation in detail that is, it flattens the records from RDM_VALIDATION table by converting each column failure in a row.

    RDM_VALIDATION_DETAIL_HIST table contains the archived records from RDM_VALIDATION_DETAIL table.

    From the JSON, table MDM_SPECIALTY is identified to validate. It’s joined with parent table MDM_HCP based on CUSTOMER_ID to get Parent DW_ID. The resultant is left outer join with RDM_VALIDATION log table to get records that failed previously. The records from MDM_SPECIALTY table is filtered based on timestamp passed as argument to stored procedure. Further, the resultant is joined with RDM table to identify the mismatch. Failed records are inserted to TEMP_RDM_VALIDATION table. TEMP_RDM_VALIDATION table is compared with RDM_VALIDATION table to identify new inserts and updates. Once the json is read and processed, the output is loaded to RDM_VALIDATION table. This table holds the record until the validation is successful for a profile. The table used to move data from staging to canonical is RDM_VALIDATION. RDM_VALIDATION_DETAIL table consists of same columns as RDM_VALIDATION table but maintains history. The SQL prepared to identify mismatch is executed for each table eligible to validate. Records of RDM_VALIDATION_DETAIL table are archived after the invalidated records are resolved to RDM_VALIDATION_DETAIL_HITORY table.

  • SAMPLE SQL

    (WITH RDM AS (

    SELECT

    DISTINCT b.ID, b.DATA_SOURCE_CD, b.EXTERNAL_CODE AS CODE

    FROM

    MDM_CANONICAl.R_CODE_ITEMS a

    INNER JOIN IDP_IQVIADEV03_OAIDP_DEVI_USV_ENV_DWH.MDM_CANONICAl.R_EXTERNAL_ITEMS b ON

    a.id = b.id

    WHERE

    b.DOWNSTREAM_DEFAULT = 'true'

    AND b.STATUS_CODE = 'ACTV'), ID AS (

    SELECT

    Parent.DW_ID AS PARENT_DW_ID, ID.*, 'CRMMI' IncomingSourceName, log.SOURCE_NAME logSOURCE_NAME, log.VALIDATION_STATUS, log.LAST_UPD_DT logLAST_UPD_DT, log.CREATION_DT logCREATION_DT, log.Last_Run_DT logLast_Run_DT

    FROM

    IDP_IQVIADEV03_OAIDP_DEVI_USV_ENV_DWH.ODP_CORE_STAGING.TEST_Specialty ID

    INNER JOIN IDP_IQVIADEV03_OAIDP_DEVI_USV_ENV_DWH.ODP_CORE_STAGING.TEST_HCP Parent ON

    ID.CUSTOMER_ID = Parent.CUSTOMER_ID

    LEFT OUTER JOIN IDP_IQVIADEV03_OAIDP_DEVI_USV_ENV_DWH.ODP_CORE_STAGING.RDM_VALIDATION log ON

    (log.PARENT_OBJECT_ID = Parent.DW_ID)

    AND SOURCE_OBJECT = 'TEST_Specialty'

    WHERE

    ( ID.UPDATE_DTT > to_timestamp('2020.04.24 04:00:00', 'YYYY.MM.DD HH:MI:SS') )

    OR (log.VALIDATION_STATUS IN('Error-Entity', 'Error-Sub-Entity', 'Error-Attribute')))

    SELECT

    DISTINCT 'TEST_Specialty' OBCTECT_NAME, NVL(logSOURCE_NAME, IncomingSourceName) SOURCE_NAME, PARENT_DW_ID AS PARENT_OBJECT_ID,

    CASE

    WHEN RDM_SPECIALTY.code IS NULL THEN 'Error-Entity'

    ELSE 'SUCCESS'

    END AS VALIDATION_STATUS, DW_ID AS CHILD_OBJECT_ID, decode(RDM_SPECIALTY.code, NULL, 'SPECIALTY' || ',', '') AS WARNING_COLUMNS, array_construct(object_construct('column', 'SPECIALTY', 'LookUpType', 'LKUP_IMS_SPECIALTY', 'value', ID.SPECIALTY)) AS VALIDATION_DETAILS, logLAST_UPD_DT LAST_UPD_DT, logCREATION_DT CREATION_DT, logLast_Run_DT Last_Run_DT

    FROM

    ID

    LEFT OUTER JOIN RDM RDM_SPECIALTY ON

    (RDM_SPECIALTY.DATA_SOURCE_CD = 'CRMMI'

    AND RDM_SPECIALTY.code = ID.SPECIALTY

    AND RDM_SPECIALTY.ID LIKE '%LKUP_IMS_SPECIALTY%' )

    WHERE

    ( VALIDATION_STATUS IN ('Error-Entity', 'Error-Sub-Entity', 'Error-Attribute')

    OR ( (VALIDATION_STATUS IS NULL

    OR VALIDATION_STATUS = 'SUCCESS')

    AND ID.SPECIALTY IS NOT NULL

    AND RDM_SPECIALTY.code IS NULL) ))

    RDM_VALIDATION table Structure:

    SOURCE_OBJECT (VARCHAR): Table name of the failed record

    PARENT_OBJECT_ID (VARCHAR): DW_ID of PARENT table

    CHILD_OBJECT_ID (VARCHAR): DW_ID of CHILD table (empty when the record is from parent table)

    VALIDATION_STATUS (VARCHAR): one among the three values of ‘rdmValidationFailureAction’. During delta, if a record is validated successfully then it'svalue is ‘SUCCESS’

    WARNING_COLUMNS (VARCHAR): Column names of the table that failed validation

    VALIDATION_DETAILS (VARIANT): contains lookup type and external value in JSON format

    LAST_UPD_DT (TIMESTAMP_LTZ): date when the record was last updated w.r.t. VALIDATION_DETAILS column

    CREATION_DT (TIMESTAMP_LTZ): creation date of a record

    LAST_RUN_DT (TIMESTAMP_LTZ): last processed date of a record

    Stored Procedure MDM_RDM_VALIDATION has following functions:

    → executeSQLStr: To execute queries

    → buildJoin: To prepare left outer join part of query dynamically

    → buildWhere: To build where part of query dynamically

    → colNames: build the decode part of query to get rejected column names

    → colValidationStatus1: build the When part of query which decides validation status

    → arrConstruct: build the array part of query which prepares JSON of failed fields

    Copy
    [ {
    "LookUpType": "LKUP_IMS_COUNTRY",
    "column": "COUNTRY",
    "value": "USA"
    }]

    Sample pipeline where RDM validation is implemented

    RDM validation is configured under below task for GIF source:

    Sample stored procedure call

    CALL ODP_CORE_LOG.MDM_RDM_VALIDATION(@HCP_Mapping,@Task_SystemName,false);

    HCP_Mapping: Configuration JSON

    Task_SystemName: Name of the process to pick duration from control log table.

    RDM reporting template

    The detail and summary report consists of records that failed validation.

    RDM_Reject_Summary

    This report has the summary of RDMs rejected as shown below:

    Count column specifies number of records rejected for the corresponding LOOKUP_TYPE.

    RDM_Reject_Detail

    This report has the detailed information of rejected records as shown below:

    DAYSOPEN columns specifies the number of days since the reject occurred.

Troubleshooting

From the validation_detail column, get lookuptype and value. Check RDM for the same and take appropriate action.

One time data Correction(Optional)

In case of switching the rdmValidationFailureAction configurations from Error-Entity to Error-Warning or Error-SubEntity to Error-Warning or Error-Atribute to Error-Warning, you can perform below steps to sync the existing data.

Set the last run date to older date or delete it and trigger the pipeline. this can move all the source data to Reltio including missing code values and the missing codes are extracted in reject report.

Delete from ODP_CORE_LOG.MDM_PROCESS_CONTROL_LOG WHERE PIPELINE_CODE = <PROCESS_NAME>;